21 research outputs found
Analyzing Multilingual Competency of LLMs in Multi-Turn Instruction Following: A Case Study of Arabic
While significant progress has been made in benchmarking Large Language
Models (LLMs) across various tasks, there is a lack of comprehensive evaluation
of their abilities in responding to multi-turn instructions in less-commonly
tested languages like Arabic. Our paper offers a detailed examination of the
proficiency of open LLMs in such scenarios in Arabic. Utilizing a customized
Arabic translation of the MT-Bench benchmark suite, we employ GPT-4 as a
uniform evaluator for both English and Arabic queries to assess and compare the
performance of the LLMs on various open-ended tasks. Our findings reveal
variations in model responses on different task categories, e.g., logic vs.
literacy, when instructed in English or Arabic. We find that fine-tuned base
models using multilingual and multi-turn datasets could be competitive to
models trained from scratch on multilingual data. Finally, we hypothesize that
an ensemble of small, open LLMs could perform competitively to proprietary LLMs
on the benchmark.Comment: Accepted at SIGARAB ArabicNLP 202
Policy space abstraction for a lifelong learning agent
This thesis is concerned with policy space abstractions that concisely encode alternative
ways of making decisions; dealing with discovery, learning, adaptation and use of these
abstractions. This work is motivated by the problem faced by autonomous agents that
operate within a domain for long periods of time, hence having to learn to solve many
different task instances that share some structural attributes. An example of such a
domain is an autonomous robot in a dynamic domestic environment. Such environments
raise the need for transfer of knowledge, so as to eliminate the need for long learning
trials after deployment.
Typically, these tasks would be modelled as sequential decision making problems,
including path optimisation for navigation tasks, or Markov Decision Process models for
more general tasks. Learning within such models often takes the form of online learning
or reinforcement learning. However, handling issues such as knowledge transfer and
multiple task instances requires notions of structure and hierarchy, and that raises several
questions that form the topic of this thesis – (a) can an agent acquire such hierarchies in
policies in an online, incremental manner, (b) can we devise mathematically rigorous
ways to abstract policies based on qualitative attributes, (c) when it is inconvenient to
employ prolonged trial and error learning, can we devise alternate algorithmic methods
for decision making in a lifelong setting?
The first contribution of this thesis is an algorithmic method for incrementally
acquiring hierarchical policies. Working with the framework of options - temporally
extended actions - in reinforcement learning, we present a method for discovering
persistent subtasks that define useful options for a particular domain. Our algorithm
builds on a probabilistic mixture model in state space to define a generalised and
persistent form of ‘bottlenecks’, and suggests suitable policy fragments to make options.
In order to continuously update this hierarchy, we devise an incremental process which
runs in the background and takes care of proposing and forgetting options. We evaluate
this framework in simulated worlds, including the RoboCup 2D simulation league
domain.
The second contribution of this thesis is in defining abstractions in terms of equivalence
classes of trajectories. Utilising recently developed techniques from computational
topology, in particular the concept of persistent homology, we show that a library of
feasible trajectories could be retracted to representative paths that may be sufficient for
reasoning about plans at the abstract level. We present a complete framework, starting
from a novel construction of a simplicial complex that describes higher-order connectivity
properties of a spatial domain, to methods for computing the homology of this
complex at varying resolutions. The resulting abstractions are motion primitives that
may be used as topological options, contributing a novel criterion for option discovery.
This is validated by experiments in simulated 2D robot navigation, and in manipulation
using a physical robot platform.
Finally, we develop techniques for solving a family of related, but different, problem
instances through policy reuse of a finite policy library acquired over the agent’s lifetime.
This represents an alternative approach when traditional methods such as hierarchical
reinforcement learning are not computationally feasible. We abstract the policy space
using a non-parametric model of performance of policies in multiple task instances, so
that decision making is posed as a Bayesian choice regarding what to reuse. This is
one approach to transfer learning that is motivated by the needs of practical long-lived
systems. We show the merits of such Bayesian policy reuse in simulated real-time
interactive systems, including online personalisation and surveillance
Scaled-up Discovery of Latent Concepts in Deep NLP Models
Pre-trained language models (pLMs) learn intricate patterns and contextual
dependencies via unsupervised learning on vast text data, driving breakthroughs
across NLP tasks. Despite these achievements, these models remain black boxes,
necessitating research into understanding their decision-making processes.
Recent studies explore representation analysis by clustering latent spaces
within pre-trained models. However, these approaches are limited in terms of
scalability and the scope of interpretation because of high computation costs
of clustering algorithms. This study focuses on comparing clustering algorithms
for the purpose of scaling encoded concept discovery of representations from
pLMs. Specifically, we compare three algorithms in their capacity to unveil the
encoded concepts through their alignment to human-defined ontologies:
Agglomerative Hierarchical Clustering, Leaders Algorithm, and K-Means
Clustering. Our results show that K-Means has the potential to scale to very
large datasets, allowing rich latent concept discovery, both on the word and
phrase level
A Two-Stage Optimization-based Motion Planner for Safe Urban Driving
Recent road trials have shown that guaranteeing the safety of driving
decisions is essential for the wider adoption of autonomous vehicle technology.
One promising direction is to pose safety requirements as planning constraints
in nonlinear, non-convex optimization problems of motion synthesis. However,
many implementations of this approach are limited by uncertain convergence and
local optimality of the solutions achieved, affecting overall robustness. To
improve upon these issues, we propose a novel two-stage optimization framework:
in the first stage, we find a solution to a Mixed-Integer Linear Programming
(MILP) formulation of the motion synthesis problem, the output of which
initializes a second Nonlinear Programming (NLP) stage. The MILP stage enforces
hard constraints of safety and road rule compliance generating a solution in
the right subspace, while the NLP stage refines the solution within the safety
bounds for feasibility and smoothness. We demonstrate the effectiveness of our
framework via simulated experiments of complex urban driving scenarios,
outperforming a state-of-the-art baseline in metrics of convergence, comfort
and progress.Comment: IEEE Transactions on Robotics (T-RO), 202